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ABSTRACT. Finding the sparsest solution a for an under- determined 
linear system of equations Da = s is of interest in many applications. 
This problem is known to be NP-hard. Recent work studied conditions 
on the support size of a that allow its recovery using £i -minimization, 
via the Basis Pursuit algorithm. These conditions are often relying on a 
scalar property o/D called the mutual- coherence. In this work we intro- 
duce an alternative set of features of an arbitrarily given D, called the 
capacity sets. We show how those could be used to analyze the perfor- 
mance of the basis pursuit, leading to improved bounds and predictions 
of performance. Both theoretical and numerical methods are presented, 
all using the capacity values, and shown to lead to improved assessments 
of the basis pursuit success in finding the sparest solution of Da ~ s. 

1. Introduction 

A powerful trend in signal processing that has evolved in recent years 
is the use of redundant dictionaries, rather than just bases, for a sparse 
representation of signals (images, sound tracks, and more). In such 
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a setting, we consider a linear equation s = Da, where s is a given 
signal, D is the representation dictionary, and a is the signal's repre- 
sentation. The matrix D is a general full rank N x L matrix, where 
L > N, assumed to have £2 normalized columns. The number of non- 
zero elements in the coefficient vector a is measured by the fo-norm, 
II ■ ||o, on K^. The goal is to find, within the (L — A^)-dimensional affine 
space of the solutions for this equation, the sparsest representation for 
s, i.e. one which has the least number of non-zero entries. This goal is 
formalized by the following optimization problem: 

(Pq) '■ Arg min ||a||o s.t. Da = s. 

In this paper, we consider the signals for which the solution of (Pq) is 
unique, and we define S(D) as the family of such signals. We denote 
Q = {1, L}, and refer to the support of the vector a = (ai, a^)^ 
as the set T = supp{a) = {?i G | a„ 7^ 0}. 

The problem (Pq) is NP-hard, demanding an exhaustive search 
over all the subsets of columns of D [16]. One of the most effective 
techniques to approximate its solution is the convex relaxation of the 
£o-norm. It uses the £i-norm, the closest convex norm on R^: 

(Pi) : Arg min ||a||i s.t. Da = s. 

The solution of (Pi) is carried out by linear programming. We are 
interested in signals s G S{D) for which the solutions of (Pq) and (Pi) 
coincide. The idea of using (Pi) to find the sparsest solution is called 
Basis Pursuit (BP), as coined by Chen, Donoho and Saunders [11[5]. 

Let a be a representation of s, with support F = supp{a) C fl. 
The matrix Dp is a matrix of size x |r| containing the columns (also 
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referred to as atoms) of D used for the construction of s. This matrix 
is necessarily full-rank (with rank equals |r|). Knowing the support F 
suffices to enable perfect recovery of a, and thus our interest is confined 
to the ability to recover the support F. 

Definition 1.1. A subset T G Q is called ii-reconstructible with 
respect to the dictionary D if the solution of (Pi) coincides with the 
solution of (Po) for every signal s G 5(D) that admits a representation 
with the support F. 

The main task of the paper is to obtain conditions on support sizes 
which imply that they are £i-reconstructible. For any specific support 
F C there exists a straightforward (yet exhaustive) test whether 
it admits recovery by BP - simply apply BP to the finite family of 
signals s = Da generated from coefficient vectors a with the support F 
covering all possible sign patterns (i.e. 2l'"l such test^j). If the recovery 
succeeds for all these choices of a, it will also succeed for any other 
representation with support F [9| [T5]. 

Clearly, such a testing approach is impractical in most cases. If we 
aim to find the prospects of success of the BP for a fixed cardinality |F|, 
this requires a set of tests as described above per each possible support 
F having such a cardinality, and this implies a need for approximately 
L''"! groups of tests. Thus, the exhaustive approach should be replaced 
either by a random set of tests with empirical claims, or a theoretical 
study. 

Within the theoretical attempts to estimate the power of the BP, 
two approaches are distinguished in the existing literature. Earlier 



^In fact, half of this amount is required because if a is reconstructible, then 
so is —a. 
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work carried out the worst case analysis for a given dictionary, pro- 
viding conditions on the support cardinahty that guarantee that any 
support satisfying them is £i-reconstructible [HI [9l [TTl [121 [131 120] . These 
conditions are often very restrictive and far from empirical evidence. 
Another, more recent, approach presents a probabilistic analysis, pro- 
viding conditions for special families of dictionaries under which most 
signals of a given cardinality are £i-reconstructible [H 121 |6l [THl [19]. The 
results depict a general asymptotic behavior with regard to the sparse 
support recovery. 

In both worst-case and probabilistic-analysis branches of work, 
many classical results rely heavily on a scalar feature of the dictionary, 
known as the mutual- coherence [HI [121 [131 120] • A related measure also 
used is the Babel function [HI 120] • More recent work employs the Re- 
stricted Isometry Property (RIP) [3]. The information carried by all 
these measures is very pessimistic; furthermore, the RIP is very ex- 
pensive computationally and mainly used for theoretical analysis. In 
this work we set to improve the existing worst case results for a given 
general dictionary D, as reported in [H [12l [El [20]. We achieve this 
progress by replacing the above-mentioned with a set of alternative fea- 
tures that we refer to as the capacity sets of the dictionary. A thorough 
computational analysis of D and probabilistic tools are applied to the 
problem, leading to improved probabilistic bounds. 

In the next section we recall the existing theoretical results con- 
cerning £i-recovery as a function of the support cardinality. In section 
3 we define two versions of the capacity set and present the main the- 
oretical results of this paper using these features. Section 4 expands 
on the above results by providing two numerical algorithms using the 
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capacity sets. Section 5 provides an overall comparison of the various 
methods presented in this work to assess the performance of BP for 
several test-cases. 

2. Background 

Most known results on sparsity rely on the mutual- coherence, denoted 
as /i, of the dictionary. This is the maximum of the inner products 
between the columns: // = maxj^jgj^ | < dj,dj > |. This correlation 
between the columns, reflected in its worst value by fi, helps establish- 
ing the "safe zone" for the support sizes, where both the uniqueness of 
sparsest representation and its £i-recovery can be guaranteed. 

For D = $2] a pair of orthonormal bases, the following suffi- 
cient condition for F to be £i-reconstructible is proven in [11]: 

Donoho and Elad in [8] treat a general dictionary D. They define the 
problem 

(Cr): max s.t. = 1 , (2.1) 

5€Null(D) ^ — ' 

and show that its solution is intimately tied to the ability to recover 
the support F, by the following lemma: 

Lemma 2.1. (f^, Lemma 2) A sufficient condition on the support F 
to be ii-reconstructible is 

val{Cr) < ^. (2.2) 
This criteria is used to prove the following theorem: 
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Theorem 2.2. (JB^, Theorem 7) A sufficient condition on a support 
T d Vt to he l\-reconstructihle is 



Typically, the coherence behaves at best like 0{^^), hence the 
results stated above predict quite weak £i-recovery, which is refuted by 
the empirical evidence: usually BP recovers supports of size propor- 
tional to (and not its squared-root). 

A generalization of the coherence is introduced in [8] and later 
used by J. Tropp in [20]: for any < m < L, the Babel function 
/ii(m) is defined by 



In terms of this function, a support of size m is proven to be ^l- 
reconstructible provided the following inequality holds [2U] : 



Unfortunately, in cases where the coherence yU. is close to 1 (implying 
an existence of at least one problematic pair of atoms), the growth of 
/ii(m) is too fast to provide any improvement. 

Average case analysis improves the asymptotic bounds on recon- 
structible support sizes. The work in [2] shows that for the dictionary 
D = [I, F*], where F is the Fourier transform, random uniformly sam- 
pled support admits £i-recovery with high probability if (the expec- 
tation of) its cardinality is 0{N / \ogN), which improves the 0{-\fN) 
estimation of the worst case approach. For a general orthonormal pair, 
it is shown in ([2], Theorem 5.3) that most random supports which 




(2.3) 



= max max 

|A|=m »;6n\A 




/ii(m — 1) + jJ'iijn) < 1. 
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cardinality behaving like 0(l/(/i^log^ A^)) admit recovery by BP. The 
logA^ appearing in these expressions is suspected by the authors of 
[2] to be unnecessary, which in effect turns this expression into 0{N) 
(for incoherent dictionaries). A similar and related result, exhibiting 
the square of the mutual coherence in the denominator of the bound, 
appears in [19]. As such, this result is effective in cases where the 
dictionary is "uniformly coherent" , and the methods employed are not 
very suitable for dictionaries with high coherence. 

The idea that representations with cardinalities 0{N) are li - 
reconstructible is supported by the results reported in [6l [71 110] . This 
result is obtained for asymptotically growing dictionaries of size A^ x 6N 
constructed by concatenating random vectors of unit /2-norm, inde- 
pendently drawn from the uniform distribution. It is shown that all 
supports of size up to p{S)N are £i-reconstructible with probability ap- 
proaching 1. The work in [71 [10] provides theoretical assessments for 
p((5), based on connection to study on neighborly polytopes. Despite 
being asymptotical, these results illuminate the empirically-supported 
evidence regarding the reconstruction abilities of minimal Lo-norm sup- 
ports by linear programming. 

As good as these results sound, they do not provide useful nu- 
merical information about the ability of £i-reconstruction applied to a 
specifically given dictionary D of certain size, which is a practical and 
central question in the application of BP. Such information can only be 
obtained today by results involving the coherence fi or its descendants. 
Thus, the gap is especially big when the dictionary is not uniformly 
coherent and when fi ^ 

In this work we introduce new features of the dictionary D, the 



VN 
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capacity sets. These features are obtained as the solutions to specific 
hnear programming problems that probe the dictionary D. We consider 
two such options: a vector of capacities q and a matrix Q, as we 
shall explain in details in the next section. These features are used to 
develop novel analysis of BP performance as a function of the support's 
cardinality. 

One interesting benefit of the proposed analysis is a better treat- 
ment of dictionaries which are not "uniformly coherent" . In cases where 
there exists a small set of columns in D with strong linear dependency, 
the coherence and the babel function behave badly, tending to lead to 
overly pessimistic bounds. As we show, the use of the capacities leads 
in these cases to much better results. Besides that, the capacities are 
shown to be more delicate indicators of the dictionary, as reflected in 
a better prediction of the BP performance. 

Use of capacity sets bridges the gap between purely theoretical 
estimations of the reconstructible support sizes for given dictionary 
D, which are usually fast but provide pessimistic lower bound, and 
the empirical tests of D, which give very accurate account on BP- 
reconstruction abilities, but are computationally prohibitive. We pro- 
pose theoretical results and algorithms that employ the capacity sets 
to perform computational assessment of these abilities, which is fast 
relative to full empirical test and more optimistic than known practi- 
cal formulae. The question of computational complexity is discussed 
in details in section 15.41 
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3. Capacity Sets and Their Use 

In this section we define two versions of tlie capacity sets, and state tlie 
main tlieoretical results tliat employ them for the analysis of the BP. 

3.1 The Capacity Vector q 

The capacity vector consists of elements related to an intermediate tool 
used in the proof of Theorem 12.21 in [8] : 

Definition 3.1. The capacity vector q = (q'l, Q'l)"^ of a dictionary 
D G M^""^ is defined for all k e Q by 

qk= max 4 s.t. = 1. (3.1) 

6GNua(D) 

Computing the elements of q is relatively easy, and amounts to a 
simple set of L independent linear programming problems of the form 

Xfc = Argmin ||x||i subject to Dx = and = 1, 

X 

and then assigning = l/||xfc||i- 

To see the equivalence of the two problems, notice that the vector 
Xfc = Xfc/||xfc||i is an element of null space of D with unit £i-norm. Since 
(xfc)A; = 1 and ||xfc||i is smallest possible, the value = l/||xfc||i = 
(xfc)fe is just the solution of I3.1[ 

Via Lemma 12. ![ the definition of q provides a sufficient condi- 
tion ^^.gp gfc < I on a given support T to ensure its recovery by ii- 
minimization. Furthermore, by gathering the |r| largest entries from 
q, a simple generalization of Theorem 2.2 can be proposed. However, 
in this work we seek a better bound that takes into account the variety 
of possible supports, rather than the worst one. One such numerical 
technique is suggested in section 4, proposing a special quantization of 
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the values in q to obtain a lower bound on the fraction of support sizes 
which admit recovery by BP. 

In this section we aim to obtain a more theoretically flavored result 
that uses q. Denote by Eg the mean value of the capacity vector q, 
and by its variance J^ken^lf' ~ ^qY- The following theorem uses 
these quantities to evaluate the probability of £i-reconstruction for a 
given support size: 

Theorem A. For any 1 < i < a support T of size i, sampled 
uniformly at random from Q, admits ii-recovery with probability 

P(i) > '-^ . . (3.2) 

In the special case of a constant capacity vector, the theorem boils 
down to support size threshold of 7t4-, since then the variance becomes 
zero. We show in Section 13.21 that weakened version of Theorem A 
yields the classical threshold of |r| < | (^1 + ^ j (see Theorem 12. 2p . 
Proof: We fix i and chose subsets A, F C ^2 according to two differ- 
ent probability models. The elements of F are chosen uniformly from 
Q without replacement and form a set of i distinct column indices. 
The £ elements of A are chosen uniformly with replacement (i.e. A 
is a multiset of size i with possible duplicates). Now, define random 
variables 

Xi = ^ Qk, yi=^ Qm- (3.3) 

fcer rn&A 

In these terms, the probability P{i), defined in the statement of the 
theorem, is bounded below by P{xe < |)- In turn, we shall bound 
the probability P{xi < |) by means of the Tchebychev inequality, 
which involves the mean and the variance of Xi. These parameters 
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are easily computable for yf. by its definition, we have E(?/^) = iEg, 
var{yi) = ia^. Our result is based on the following connection between 
the variables xi and yi, as shown in Appendix A: 

¥.{xi) = E{yi) and var{xi) < var{yi). (3.4) 

Given any real scalar a > 0, the one-tailed version of the Tchebychev 
inequality [Hj for Xi reads 

P {xi - E^> aa^) = P{xi> E^ + aa^) < 



l + a2 ' 

where E^ = E(x^), = var{xe). 

By (13.41) . we substitute E^ = iEg. Also, since a larger variance 
implies a lower probability, we put y/Iag instead of and obtain 

1 



P (^xe > £Eq + aViag^ <P{xi> E^ 



l + a2- 

The parameter a is chosen such that lEq + a^/Jaq = |, leading to 
a = (| — lEq)/{-\/l,aq). Note that the condition a > translates to the 
requirement £ < as claimed in the theorem. In case it holds, we 
have 



21 - ^ (h-iE'' ' 



J- H 



1 



or put differently. 



. 2 ' 



2 1 + 

as stated by the theorem. □ 
3.2 From Capacity Vector to Coherence 

We mentioned earlier that previous work often uses the mutual coher- 
ence to derive performance bounds on £i-reconstructible supports. The 
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relation between the capacities in q and the inner products between 
the dictionary atoms, | < dj,dj > | has been already discussed in [8]. 
Given a dictionary D, construct its Gram matrix as G = D^D. Define 
the sequence 

/ifc = max \Gi^k\ for k & Q. (3-5) 

Namely, fik is the maximal value on the k-th column of | G | , disregard- 
ing the main diagonal entry. As [8] shows, this sequence of values 
satisfies 

^ f^k 



yUfc + 1 

Thus the condition J^ker^k < \ can be replaced with Ylik& ]lf+T ^ i' 
leading of-course, to weaker bounds. Further relaxation 

Qk < — — < — — 3.6 
/ifc + 1 /i + 1 

yields a constant capacity vector with entries of size -j^. Applying 
Theorem A to this vector we obtain, as a special case, the classical 
Theorem 12.21 

3.3 Using the Capacity Matrix Q 

One problem with the capacity vector q is the independence with which 
its entries are computed. This implies that one (or more) of the 
entries in q may become unnecessarily large, compared to the val- 
ues obtained in Equation (12. ip . causing a weaker bound. By working 
with pairs of such entries, one could in principle improve the obtained 
bounds. This leads us to the following definition: 

Definition 3.2. Denote by Q2 the set of indices Q2 = hj ^ 

Q,i < j}. The upper triangular capacity matrix Q = {Qi.j} is the 
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matrix with non-zero elements indexed by {i,j) G Q2, defined as follows: 



Qii= max \m.8ix(6i + 6j,6i — 6A} s.t. \\6\\i = 1. 

Each of these entries can be computed by two independent hnear 
programming problems of the form 

x^^.^ = Argminx ||x||i subject to Dx = and Xi + Xj = I 
x^^.^ = ArgmiUx ||x||i subject to Dx = and Xi — Xj = 1 

and then assigning Qij = 1/ min(||x+^.)||i, ||x^_^.)||i). 

As in section 3.1, the obtained values Qij could be used to form 
an improved worst-case bound for Lemma 2.1 and consequently for 
Theorem 2.2: Let F C be a randomly chosen support of sizcl ^ = 2n. 
By definition, the non-zero elements of Q satisfy 

max \6i \ + \6j \ = Qij < max \5i \ + max \5j\ = Qi + g,-. 

S&Null{D) 5eNull(D) 5£Null{D) 

\\S\\i = l p||i=l p||i=l 

Thus the values Qij can be used in the evaluation of an upper bound 
on Cr- To any partition X of F into disjoint pairs there corresponds the 
sum ^(/j^ k2)&xQkiM that bounds the value of Cr from above. There- 
fore, F is £i-reconstructible if there exists such a partition satisfying 
'n,{ki k2)exQki,k2 < |- Naturally, among all such possible partitions, we 
are interested in the one that leads to the smallest sum. 

Just one glance at the values of Q gives a lower bound for sizes of 
£i-reconstructible subsets: namely, if max(Q) < |, then a sum of any 
£/2 of its elements does not exceed 1/2; hence any subset of columns of 



^We consider hereafter even support sizes. Generalization to odd ones is 
relatively simple, requiring use of one entry from q. We omit this discussion 
for simplicity. 
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size up to i is guaranteed to be recovered by BP. Conjecture B below 
estimates the uncertainty caused by replacing max(Q) with mean(Q). 
Some numerical techniques based on Q are described in section 4. 

Here we concentrate again on a theoretical bound that uses Q, 
similar to the one proposed in Theorem A with few necessary modifi- 
cations. 

We arrange the values {Qi.j \ i < j E Q} of the Capacity matrix 
in a vector Q^. Denote by Eq the mean value of Q^, and by ctq its 
variance, ctq = jj^^^i<j&niQi,j ~ ^qY- The following statement 
based on Q is similar to the one in Theorem A: 

Conjecture B. If] For any 1 < £ < a support T of even size i, 
sampled uniformly at random from Q, admits ii-recovery with proba- 
bility 

P{i) > '-^ 2- (3-7) 

2^Q + U - 2^Q) 

Notice that the expression obtained in Equationf l3.7l) is the same as 
the one in fl3.2p . with i replaced by ^/2. Since Eg and ctq refer to pairs, 
if Eq = 2Eq and ctq = 2(7^ the two bounds are the same. However, as 
we shall demonstrate in section 5, Eq < 2Eq and ctq < 2cr^ for random 
dictionaries, implying that this bound is indeed stronger. 
Proof: Fix an even support size i. In order to translate the condi- 
tion '^(^i j)^xQiJ < I to a probabilistic one, we use again the model 
involving a subset F C ^2 of size i which elements are chosen uniformly 
from Q without replacement. Also, we let X be a random partition of 



■^This claim is a conjecture since it relies on a property that is used here 
without a proof. More on this is given in Appendix B. 
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the index set T into pairs. Based on these notions, we define a random 
variable xe = J2{ki fc2)ex '5fci,fc2- effect, xi is a sum of elements of Q 
randomly chosen "without replacement" in a stronger sense, i.e. not 
only the elements are not repeated, but two elements with common in- 
dex are not allowed. The probability P{i), defined in the statement of 
the theorem, is bounded below by P(a;^ < |). This bound is not tight, 
since the support F is reconstructible if there exists some partition X"^* 
such that J2{ki k2)ei°p^ Qk^M drops below the half, while P{xi < |) is 
only the probability this will happen for a random partition X. 

In order to analyze the variable we consider a multiset $ of size 
I chosen uniformly with replacement from Q^, and define the random 
variable yi to be its sum, yi = Then we have E{y£) = ^Eq, 

var{ye) = 

The expectation of X£ equals to that of ye, which is proven in 
Appendix B. Regarding the variance, we are making an assumption 
similar to 13.41 

var{xe) < var{ye). (3.8) 

We do not provide its proof and leave it as an open question at 
this stage. Empirical verification of this inequality is demonstrated in 
Appendix B. 

Following the steps of Theorem A, given any real a > 0, the one- 
tailed version of the Tchebychev inequality [H] for xe reads 

The parameter a is chosen such that ^Eq + a^J^aq = |, leading 
to a = (| — ^Eq)/{J |o"q), implying that we should require £ < to 
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get a > 0. This leads to 




) 



< 



1 + 




1 



or put differently, 



1 



1 + 




as stated in the theorem. 



□ 



4. Numerical Algorithms 



Given the capacity vector q (or its weaker version as described in sec- 
tion 3.2) or matrix Q, we can use Theorems A and B to predict the 
£i-reconstructible supports, and show lower bounds of the probability 
for success as a function of the support size i. However, we can alterna- 
tively evaluate these probabilities numerically, provided that there are 
shortcuts that avoid the exponential growth in support possibilities. 
This leads us to the following two algorithms. 

4.1 A Fast Combinatorial Count Using q 

Below we propose an algorithm which provides worst-case bounds on 
reconstructible support sizes. We would like to establish the fraction 
of the total number of supports F of size i that satisfy val{Cr) < ^■ 
Testing the sufficient condition J2k&r < \ for every single V requires 
0{L^) flops, which is prohibitive. Instead, we propose to perform a 
quantization of the entries of q to c? distinct values, and lead to a more 
reasonable computational process. 

Suppose we are given a partition A = {Aj}f^^ of VL into d disjoint 
clusters, such that Vt = IJiLi ^i- The corresponding quantized values 
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in q are denoted by {^a}' ^ach set to be the maximal in its subset, 
{g\ = maxfc6A,(gfc) | I <i <d}. 

Given the quantization parameters A = {Aj, every £-sized 

support r G r2 can be described as the union Ijf^^ Fj, where Fj C Aj 
is the subset of indices in F allocated to the quantized value q\. Thus, 
the sum '^i.^-pQi can be replaced by a larger sum, Ylf^i iFil^'X- 

In order to test all possible supports F G of size £, a combi- 
natorial count of all sequences p = {pi, ....Pd) is performed, such that 
< \pi\ < |Aj| and X]f=i \Pi\ — ^- For each of these we evaluate 
SiLi bil^A ^iid count the relative number of those 3 below |. The com- 
plexity of such computation does not exceed O {{^Y)- 

As to the choice of the quantization parameters A = {Aj, gA}f=i) 
as said above, we let q\ = max^gA^ Qk to guarantee that the evaluated 
summations are considering a worst-case scenario. The clustering is 
done by an attempt to minimize the function 

/ ({A. g\}ti) = E ( I^^I^A - E ^0 • (4.1) 

The difference |Aj|gA — J2keA^ is the quantization error for the ele- 
ments in the subset Aj, and the above error simply sums these values. 

The minimization of / ({Aj, q\}i=i) can be done exhaustively in 
case d is small - in our experiments we have used = 3 implying 
that the above requires 0{L^) flops. For larger values of a sequential 
algorithm that chooses A, can be proposed, separating the set Vt to two 
parts, and proceeding in a tree and greedy separation scheme. 

Computationally, the results of the combinatorial count are very 
close to those predicted by Theorem A. Therefore, this method serves as 



Each instance must be weighted by the number of its possible occurrences. 
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a supporting evidence for the probabilistic approach taken in Theorem 
A, but its numerical output is omitted from our display of experimental 
results in section 5. 

4.2 A Sampling Algorithm Using Q 

An alternative to Conjecture B is a direct evaluation of £i-reconstructible 
supports r of cardinality i, by the following stages: 

• We draw M ^ L such supports {Fj}*!^. 

• For each Fj we seek to find a partition Xj that leads to the small- 
est value of J2(k i)exQk,i- While finding the best such partition 
is combinatorial in complexity, we use an approximate greedy 
algorithm of complexity ■ log{i)) which computes the fol- 
lowing suboptimal partition: 

1. Begin with empty set X of pairs. 

2. denote by Qres the sub-matrix of Q which rows an columns 
consist of only those indices from |F| which do not occur in 
X. Retrieve the couple {io,jo), {h,ji) of index pairs which 
minimize the sum Q{io,jo) + Q(^i7 ji) over Qres- 

3. joint the couple (^o, jo), to X and return to item 2 
while Qres is nonempty. 

Therefore, the algorithm is, in a sense, "second-order greedy", 
i.e. at each step the least-sum couple of values from Q, rather 
than least single value, is extracted. Possibly, better algorithms 
will improve the performance of this scheme, but we believe it 
to be quite close to optimal, while keeping low computational 
costs. The fact such partition can be found in ■ log{i)) 
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follows from the next combinatorial claim: let be the 

index pair of minimal value in submatrix of Q supported on |r|. 
Then both necessarily present among indices {io, jo,H, ji) 
defined above. 

• Given the partition X, test "^(^k i)exQk,i < |- Accumulate the 
relative number of such occurrences over the collection {Fj}^]^. 

The fact that this method relies on capacity values implies that the 
predicted performance is expected to be weaker compared to the true 
behavior of BP. Nevertheless, among the various methods discussed 
thus far, this method is expected to be the most optimistic because it 
uses Q and not q, and also because it does not build the evaluation 
through the Tchebychev inequality that looses also part of the tight- 
ness. However, as opposed to all the other methods described above, 
this method cannot claim theoretical correctness of its results. 

In the light of similarity of the proposed scheme to the pure em- 
pirical test, we can make a direct comparison of the computational cost 
of the two tests. See the details in the Section 15. 4[ 

5. Experimental Results 

5.1 Test-Cases to Study 

We carry out a number of tests on each of the three following dictio- 
naries: 

1. D — Random is the dictionary of size 128 x 256, which consists 
of £2-iiormalized random vectors, independently drawn from the 
Normal distribution on the unit sphere. Such a dictionary is 
often used in numerical experiments as well as in various appli- 
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cations. 

2. D — Spoiled is the dictionary D — Random, which has under- 
gone an operation designed to create a small set of columns 
with high linear dependence. More precisely, we re-generate a 
set of 3 columns as a random linear combination of 12 other 
columns. This dictionary is used to demonstrate the ability of 
the capacity-sets methods to better handle dictionaries with a 
non-uniform distribution of inner products. 

3. JD — DCT is the orthonormal pair [I, C*] of size 128 x 256, where 
C is the 1-dimensional Discrete Cosine basis and I the identity 
matrix. 

5.2 Behavior of q and Q 

As explained earlier, the passage from the capacity vector q to the 
matrix Q was motivated by the fact that Qij provide a lower bound 
in this context. To exhibit the numerical behavior of these bounds, we 
compute the mean and the variance of the family of ratios 

I = for k^len. (5.1) 

Qk + qi 

The mean and variance of these ratios for the three test cases is given 
in Table O 

As these figures show, we earn up to 30% of the upper bound value 
by upgrading to Capacity Matrix from the Capacity Vector. This ratio 
between the two bounds for the corresponding indices is very stable, 
as seen from the low values of the standard deviation a (R). 

To display the power of Conjecture B, we show that Eq < 2Eg and 
either cTq < 2(7^ or ctq ^ Eq. The corresponding values for various 
dictionaries are presented in the table below. 
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Dictionary 


E{i?) 




D — Random 


0.7175 


0.0008 


D — Spoiled 


0.7154 


0.001 


D - DCT 


0.6509 


0.0109 



Table 1.1 

Behavior of the capacity-sets q and Q by evaluating the mean and 
variance of the ratios. 



Dictionary 


Eq 


2Eg 




lag 


D-Random 32 X 128 


0.2329 


0.3179 


0.5849C-3 


0.8252C-3 


D-Random 64 X 128 


0.1695 


0.2345 


0.1405C-3 


0.1654e-3 


D-Random 128 X 256 


0.1235 


0.1721 


0.4511C-4 


0.5652e-4 


D-DCT 64 X 128 


0.1687 


0.2586 


0.4732e-3 


0.0112e-3 


D-DCT 128 X 256 


0.1265 


0.1943 


0.4070e-3 


0.4144e-5 



Table 1.2 

Comparison of mean and variance of capacity sets. 

Notice that for the D — DCT dictionary the variance of the capac- 
ity vector is smaller than that of the Capacity matrix, due to the special 
structure of this dictionary. Nevertheless, as seen later in the results 
section. Conjecture B predicts BP success on support sizes larger than 
those allowed by Theorem A. 

5.3 Compared Methods 

We perform a number of computations, applying various methods for 
the estimation of BP performance on the given dictionaries. The results 
are expressed via a set of Estimation Functions, EF : Q ^ R, which 
value at £ G is the predicted percentage of ^-sized supports which 
admit recovery by ^i-norm optimization. The EFs considered are the 
following: 
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1. EF-emp - The standard empirical test on the dictionary. This 
test is done by drawing 1,000 random supports for each cardi- 
nahty i, generating a corresponding signal, and solving the BP 
per each. EF-emp is obtained by showing the relative number 
of successes in recovering the support. 

2. EF-CB - the classical coherence-based upper bound |(1 + ^), 
provided by the Theorem 12.21 

3. EF-thmA - expresses the results of the Theorem A, EF-thmA 
{£) = P{t) as defined in the statement of the theorem. The 
values are computed from q of the dictionary. 

4. EF-thmB - expresses the results of the Conjecture B, computed 
from the capacity matrix Q of the dictionary. 

5. EF-compB - The results of the sampling algorithm based on 
Q, which results support the estimation of Conjecture B (see 
section 4.2). 

6. EF-GB - The Grassmanian upper bound, computed by the for- 
mula for the Classical Bound using the ideal coherence /i = 



This last EE deserves more explanation: Among all possible dictionar- 
ies of size NxL, the Grasssmanian frame is the one leading to the small- 



optimistic worst-case bound. When the dictionary is "un-balanced" , 
implying a large spread of inner-products in the Gram-matrix, we know 
that the mutual- coherence-hound deteriorates dramatically. Thus, by 
using the Grassmanian Bound, we test what is the best achievable 
coherence-based performance behavior for the same dictionary size. 




est possible coherence jj, 




[T7] . Thus, this leads to the most 
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5.4 Complexity Analysis of the Methods 

We argue the usefulness of Capacity-based numerical algorithms for 
an evaluation of a given dictionary D. To that end, we consider the 
computational complexity of each method listed in previous section. 

1. EF-emp - The standard empirical test of D is conveyed as fol- 
lows: for each support size i, pick M » L random subsets F 
of columns of size i. For each F, generate a signal with ran- 
dom coefficients vector supported on F and test if BP will re- 
cover the support. Since in practice maximal relevant size i is 
proportional to L, the computational complexity of this test is 
0{M ■ L ■ Clp{L)), where Clp{L) denotes the complexity of 
linear programming algorithm for problem of size L. 

2. EF-CB requires the computation of /i, which takes 0{L ■ N) 
flops. 

3. EF-thmA - To employ results of the Theorem A, the capacity 
vector q is computed in ( 0{L ■ Clp{L))), and then for each i 
the probability P{i), defined in the statement of Theorem A, is 
computed in 0{L). Overall complexity - + L ■ Clp{L)) = 
0{L-Clp{L)). 

4. EF-thmB - To employ results of the Conjecture B, the capacity 
vector q is computed in ( 0{L'^ ■ Clp{L))), and then for each i 
the probability P{i), defined in the statement of Conjecture B, is 
computed in 0{L'^). Overall complexity - 0{L^ + L"^ -CLpiL)) = 
0{L^-Clp{L)). 

5. EF-compB - Our heaviest (and best-performance) algorithm 
conducts a semi-empirical test: for each support size pick 
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M » L random subsets of columns of size £, and employ 
the analysis detailed in 14.21 The computational cost of sin- 
gle support treatment is 0{i'^ ■ log{i)). Overall complexity is 
. Clp{L) + M-L^- log(L)). 

As seen from the analysis above, only the EF-compB has non-negligible 
computational complexity. When comparing EF-emp and EF-compB, 
we can concentrate on the relative complexities of linear programming 
solver versus the 0{i'^-log{i)) of the partition algorithm, and the benefit 
of the later is evident. 

5.5 Comparison Results 

Figure 1 presents the obtained graphs of the various EF-s functions 
described above, for the three dictionaries described at the top of this 
section. As we see from the left-side graphs in the figures, for all the dic- 
tionaries the empirically established support size which admits BP re- 
covery is at least 40 columns. Note that this relative number of columns 
is also predicted in [TU], however, this holds true only asymptotically 
(for dictionaries of growing sizes) and for specific random dictionaries. 

Returning to statements which hold for our modest size of 128 x 
256, we notice that the estimation made by the sampling algorithm 
based on the Capacity Matrix (EF-compB) is much better than the 
Classical bound, established so far in the literature. The difference is 
especially high for the D-Spoiled dictionary, which reflects the fact that 
methods based on capacity sets manage well the non-uniform distribu- 
tion of inner products. 

On the right side of each figure we display various method devel- 
oped in this work. Noticeably, the results of Conjecture B(EF-thmB) 
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Figure 1: Estimation Functions for various dictionaries of size 128 X 256. 
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are stronger than those of Theorem A (EF-thmA), which is explained 
by the benefit of using the Capacity Matrix rather than the Capacity 
Vector. This benefit is expressed in the ratio values given in Tables 
ll.H 11.21 and explained thereafter. Apparently, Conjecture B does not 
express the full power of the Capacity Matrix estimation, since the 
sampling algorithm based on its values (EF-compB) outperforms EF- 
thmB by 15 — 20%. This algorithm produces values which are quite 
close to the Grassmanian Bound, the best possible bound one can hope 
to obtain using coherence-based estimation for the given dictionary size. 
We do not have enough information to explain the fact that values of 
EF-compB and of Grassmanian bound nearly coincide for all the dictio- 
naries discussed here (and additional ones examined during the work); 
Discovering the reason underlying this connection may be a lead to 
important insights regarding the Basis Pursuit performance. 

Appendix A 

We prove the claim 13.41 

Theorem C. For the two random variables, xg and yi, defined in \3.3\. 
the following relations between the first and second moments hold: 

E(x^) = E{yi) and var{xi) < var{yi). (A-1) 

Proof: We begin by introducing some notation. Fix the support size 
1 < £ < L. For any 1 < k < i, we denote by C| the collection of all i- 
sized non-ordered multisets of indices from Q (with repetitions), which 
have precisely k distinct elements each. For instance, {1, 4, 5, 4, 7} and 
{5,1,7,4,4} are two distinct elements of C|. Such multiset will be 
sometimes referred to as " index set" . Also, we define "D" = U C^~^ U 
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... U C^"", the collection of all £-sized multisets having at least £ — n 
distinct elements. 

In this notation, X£ is a random variable with uniform distribution 
over the domain V^, which admits value YlkeA^k on a given element 
A G The variable y£ has the same definition on a larger domain 
"D^"^, containing the domain of xe. Therefore, we treat both X£ and ye 
as restrictions of the same uniformly distributed random variable x on 
the corresponding domains: Xe = Xypo, ye = x^^i-i. In the proof we use 
the following basic property of the variance: 

Proposition 5.1. Let z he a random variable defined over a domain 
given as the disjoint union T) = "Di U 1^2 U ... U P„, with uniform 
distribution. Denote v = var{z\Ty),Vi = var{z\x>i), Si = I'Dj]. Then 

En 

Part 1. The expectation of the random variable x restricted to is 
computed by 

' ^ ' AeV° k£A 

This sum contains \ ■£ elements, and for each j G Q, qj appears in it 

the same number of times. Therefore, each qj appears I'D'^lj times, and 

we have E(x\jyo) = y'Y^Qk = ^-E'g- The mean of x^j^e-i is computed 

ken 

similarly: 

Here each qj appears \ V^^~^\j times, and we have E(xp£-i) = y qt = 

ken 

This proves our first claim, E{xe) = E{ye)- For the rest of the 
proof, where only the variance of the two variables is considered, we 
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assume w.l.g. that the expectation of xe and ye is zero (in the hght 
of equahty var{z) = var{z — E{z) for any random variable z), that is 

Eg = 0. 

Part 2. We consider the extension of x, defined so far on domain 
comprising of distinct ^-sized index sets, to the domain where each 
such set may appear any finite number of times, x still has a uniform 
distribution over this collection. Thus, a disjoint union of two or more 
(non-necessarily distinct) index sets is a sub-domain to which x may 
be restricted. 

For any < n < £, we define two disjoint unions 

An= [j {Tu {j} I J e r}, 
Bn= [j {ru {j} I J e Q} 

(In the definition of An, the set F U {j} is added to the collection one 
time for each appearance of j in F.) 

Let A G be a set which contains distinct indices ji, jfc with 
multiplicities mi, ..,mfc (so that Yli=i^i ~ -^^^ each 1 < ? < /c, A 
is obtained in An — 1 times in the form F U {jj} for an appropriate 
F = Fj G C^^^ (this claim also holds vacuously for mj = 1). Therefore, 
the number of copies of A in An equals Yl\=i{^i — 1) = i — k. Also, 
A appears in Bn precisely once for each ji, ...,jk, in the form F U {ji} 
(for an appropriate F = Fj each time). Therefore, Bn contains k copies 
of A. 

Denote a disjoint union of a distinct copies of some collection C 
by a ■ C. Then we can write An, Bn as 

A = ■ U 1 ■ C^f^ U ... U n ■ C^f" (A-2) 
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Bn = i-C^eU{i-l) ■ C^-^ U ... U (£ - n) ■ Q^"" (A-3) 
We prove the following inequality: 

Since Eg = Ohj our assumption, the expectations of and also 
equal zero: by the argument similar to one presented in the first part 
of the proof, e(x|^,J = e{x\i3^) = i ■ Eg. Thus we have 



For the brevity of the argument we introduce the notation qr = Qk- 

fcer 

Then war (x|_4^) reads as 



var{x\Aj = — — J2 73Tl](^r + gI + 2grg,) 
Similarly, we have 



'£-11 



re©" , jen fcer 



The summand -j — - — - appears in both expressions hence 

cancels out. We consider the term —^^ — - - — - in var{x\j\^^). 

The element appears in it same number of times for every a E Q. 
Hence ^ ^^yE^i " ^E^"' ^^"^^ argument, in 
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the expression of f ar(x|B„) we have V; ^ 5Z " 

hence this quadratic term also cancels out. In the light of these obser- 
vations, we obtain 



Here we substitute again gr for ^^Q'j and recall ^^^Q'j = -^'g = 0. 
Thus, we have 



In order to use this result for the proof of the theorem, we make 
the following observations : Denote w„ = var{x\q^) and s„ = |C"|. 
By virtue of the decomposition (IA-2p . var{x\j[^) can be written as 
far(a;u ) = — ~ — ~" (see Proposition I5.ip . Similarly, we have 

varix\B ) = //, X ^ ^ ^ ^ - We compute the coefficients of Vi 

in the expression 



var{x\j^^) - var{x\B„) 



For any < A; < n, the coefficient of vi-k is 

— — s^_fe \k^{l-i)- si_i - {i - k)^i ■ se. 

\ i=l ^ i=l 

Den '=^»=oV J 

with 

n n 

Den = ^i- Si-i ■ - i) ■ si-i. 

i=l i=l 
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n 

We denote ag-k = i''^^{k — i)se-i, for 1 < /c < n, in order to write 

i=0 

the above difference as 

1 " 

< var{x\A„) - var{x\i3j = ^ ai-kSi-kVe-k- (A-4) 

k=0 

The constant — — is positive, since n < i. Thus,it can be omitted 
Den 

while preserving the inequahty: 

n 

<'^0!i^kSe~kVi-k- (A-5) 

k=0 

The coefficients in this expression have the two following properties: 

1- Y.l=oSl-kO!l-k = 0. 

2. Vj, aj^i -aj = £ J2Lo ^i-i- 

To show the first equality, we consider the sum in (1) as the linear 
combination of the elements se-iSe-j, i,j = 0,...,n. The coefficient 
of sg-iSi-i is zero for any i. For any i ^ j, sg-iSi-j appears just 
in two components of the sum above, namely, and s^^ja^^j. 

Specifically, a^.j contains the summand i{i — j)s£^j, and ai^j contains 
the summand i{j — i)si-i, therefore in the sum sg^iae^i + Si^jae^j the 
coefficient of sg^iSi^j is zero. The second property follows from the 
definition of Oj. In the light of the first property, IA-51 can be written 
as 

n n 

(^ai>-kSi-k)vi < ^ae-kSe-kVe-k- (A-6) 

k=l k=l 

Equipped with these observations, we prove, by induction on n, 
the inequality 

'yar(xpo) < mr(xpn). 
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for any n = — 1. The theorem follows for n = i — 1. By 

Proposition 15. ![ var{x\' 
Thus we need to prove 



Proposition 15. ![ var{x\Ti^) = — and var{x\-xyo) is just Vi. 



En 



En 1 

or 

n n 

S£-i)f^ < ^ Si_iVi_i. (A- 7) 

i=l i=l 

For n = 1, IA-61 reads as 

ai-ise-ive < ai-iSi-iVi-i. 
Here = fs^ > 0, thus we obtain the inequality 

as required. Now, we assume by induction that inequality I A- 71 holds 
up to 77, — 1 and prove for 77. We use ( lA-611 : 

n n 

(El) : (^ai_kSi-k)ve < ^ai^kSi-kVi-k- 

k=l k=l 

This inequality undergoes a series of transformations designed to bring 
it to the form of IA-7[ 

First, we have ag-i < ae-2. Since V£ < Vi^i by the proof for 77 = 1, 
we have an inequality 

Adding {dl) to the inequality (£^1), we arrive at 

(-E2) : ^a^_2(s^_i + Si-2) + ^ ae-kSi-k^ < 

n 

< ai-2{si-iVi-i + Si-2Vi-2) + y^^ai-kSj-kVi-k- 

k=3 
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Second, by induction assumption for n = 2 we have the inequahty 

Also, a£-2 < <ye-3 as noticed earher. Then we can construct the next 
inequahty in order to add it to {E2): 

(dl) : (a^_3 - ai-2){si-i + se-2)ve < {ae-3 - ae-2){si-iVi-i + se-2Ve-2) 
This resuhs in the following expression: 

\ 1=1 k=A / 

3 n 

< a£_3 + ^ ai_kSi-kVi-k- 

i=l fc=4 

In this fashion we make n — 1 steps resulting in the inequality 

n n 

{E{n)) : ^ st-i)vi < oti-n ^ S(,-iVi_i 

i=\ i=\ 

Notice that is positive: a^-n = S£_„£(ns£+(n— l)s^_i+...+S£_„+i). 
Thus, we obtain the desired result. As mentioned, the theorem follows 
for n = £ — 1. □ 

Appendix B 

We prove the equality of expectations 

E{xe) = E{ye), (B-1) 

for random variables xe and ye defined in the proof of Conjecture B. 
Recall that yi is a sum of | values from Q, uniformly distributed over 
this matrix, therefore E(yi) = |iEq. We show = |iEq, too, by con- 

siderations of symmetry, similar to those used in the proof of Theorem 
A, part 1. 
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Namely, we consider a totality Ve of partitions of all £-sized sup- 
ports A C fi, into ordered pairs of indices. An element in this collection 
is therefore a pair (A,Xa). We clarify that the index sets A C are 
chosen without repetitions and up to a permutation of their elements. 
Now, let {i,j) be an ordered pair of indices from Q. We argue that 
the number of appearances of this pair in the elements of Ve does not 
depend on choice of i and j. Indeed, this number is just the size of the 
collection Ve-2, built for submatrix of Q with i-th and j-th rows and 
columns missing. 

Since X£(A,Xa) is the sum Q(^, j), we conclude that all 

the elements Q(z, j) contribute to the value of Xi with equal probability, 
hence E(xe) = |Eq as desired. 

Now we provide an empirical evidence to the claim 



Statistical data below supports this inequality. While the variance of 
yi is known precisely, for Xe we estimate it by drawing 10^ random 
subsets of indices for each support size up to half the signal dimension 
of the dictionary. Results are presented in Figure 2. The computation 
is carried out for a number of dictionary sizes on dictionary D-Random. 
As can be seen from these figures, the gap between var{x£) and variyi) 
is roughly proportional to the support size. 

Same experiments on dictionary D-DCT display different results: 
the variance of both variables coincides. As number of samples grows, 
we observe that the difference of variance values, for all support sizes, 
tends to zero. We conclude that for this specific dictionary, IB-21 is an 
equality. 




(B-2) 
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Figure 2: The variances of X£ and yi (scaled by 10^ ) 
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